Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 43
Filtrar
1.
Genome Biol Evol ; 6(10): 2721-30, 2014 Sep 25.
Artigo em Inglês | MEDLINE | ID: mdl-25260584

RESUMO

Prototype galectins, endogenously expressed animal lectins with a single carbohydrate recognition domain, are well-known regulators of tissue properties such as growth and adhesion. The earliest discovered and best studied of the prototype galectins is Galectin-1 (Gal-1). In the Gallus gallus (chicken) genome, Gal-1 is represented by two homologs: Gal-1A and Gal-1B, with distinct biochemical properties, tissue expression, and developmental functions. We investigated the origin of the Gal-1A/Gal-1B divergence to gain insight into when their developmental functions originated and how they could have contributed to vertebrate phenotypic evolution. Sequence alignment and phylogenetic tree construction showed that the Gal-1A/Gal-1B divergence can be traced back to the origin of the sauropsid lineage (consisting of extinct and extant reptiles and birds) although lineage-specific duplications also occurred in the amphibian and actinopterygian genomes. Gene synteny analysis showed that sauropsid gal-1b (the gene for Gal-1B) and its frog and actinopterygian gal-1 homologs share a similar chromosomal location, whereas sauropsid gal-1a has translocated to a new position. Surprisingly, we found that chicken Gal-1A, encoded by the translocated gal-1a, was more similar in its tertiary folding pattern than Gal-1B, encoded by the untranslocated gal-1b, to experimentally determined and predicted folds of nonsauropsid Gal-1s. This inference is consistent with our finding of a lower proportion of conserved residues in sauropsid Gal-1Bs, and evidence for positive selection of sauropsid gal-1b, but not gal-1a genes. We propose that the duplication and structural divergence of Gal-1B away from Gal-1A led to specialization in both expression and function in the sauropsid lineage.


Assuntos
Galectinas/química , Vertebrados/classificação , Animais , Galectinas/genética , Filogenia , Estrutura Secundária de Proteína , Vertebrados/genética
2.
Integr Biol (Camb) ; 3(4): 350-67, 2011 Apr.
Artigo em Inglês | MEDLINE | ID: mdl-21424025

RESUMO

In this Perspective, we propose that communication theory--a field of mathematics concerned with the problems of signal transmission, reception and processing--provides a new quantitative lens for investigating multicellular biology, ancient and modern. What underpins the cohesive organisation and collective behaviour of multicellular ecosystems such as microbial colonies and communities (microbiomes) and multicellular organisms such as plants and animals, whether built of simple tissue layers (sponges) or of complex differentiated cells arranged in tissues and organs (members of the 35 or so phyla of the subkingdom Metazoa)? How do mammalian tissues and organs develop, maintain their architecture, become subverted in disease, and decline with age? How did single-celled organisms coalesce to produce many-celled forms that evolved and diversified into the varied multicellular organisms in existence today? Some answers can be found in the blueprints or recipes encoded in (epi)genomes, yet others lie in the generic physical properties of biological matter such as the ability of cell aggregates to attain a certain complexity in size, shape, and pattern. We suggest that Lasswell's maxim "Who says what to whom in what channel with what effect" provides a foundation for understanding not only the emergence and evolution of multicellularity, but also the assembly and sculpting of multicellular ecosystems and many-celled structures, whether of natural or human-engineered origin. We explore how the abstraction of communication theory as an organising principle for multicellular biology could be realised. We highlight the inherent ability of communication theory to be blind to molecular and/or genetic mechanisms. We describe selected applications that analyse the physics of communication and use energy efficiency as a central tenet. Whilst communication theory has and could contribute to understanding a myriad of problems in biology, investigations of multicellular biology could, in turn, lead to advances in communication theory, especially in the still immature field of network information theory.


Assuntos
Evolução Biológica , Comunicação Celular/fisiologia , Teoria da Informação , Envelhecimento/fisiologia , Algoritmos , Animais , Padronização Corporal/fisiologia , Quimiotaxia/fisiologia , Cromossomos/fisiologia , Dictyosteliida/fisiologia , Feminino , Código Genético/fisiologia , Fenômenos Genéticos/fisiologia , Crescimento e Desenvolvimento/fisiologia , Humanos , Glândulas Mamárias Animais/crescimento & desenvolvimento , Feromônios/metabolismo , Polissacarídeos/fisiologia , Percepção de Quorum/fisiologia , Saccharomyces cerevisiae/fisiologia , Transdução de Sinais/fisiologia , Fuso Acromático/fisiologia
3.
BMC Bioinformatics ; 7: 250, 2006 May 08.
Artigo em Inglês | MEDLINE | ID: mdl-16681860

RESUMO

BACKGROUND: The statistical modeling of biomedical corpora could yield integrated, coarse-to-fine views of biological phenomena that complement discoveries made from analysis of molecular sequence and profiling data. Here, the potential of such modeling is demonstrated by examining the 5,225 free-text items in the Caenorhabditis Genetic Center (CGC) Bibliography using techniques from statistical information retrieval. Items in the CGC biomedical text corpus were modeled using the Latent Dirichlet Allocation (LDA) model. LDA is a hierarchical Bayesian model which represents a document as a random mixture over latent topics; each topic is characterized by a distribution over words. RESULTS: An LDA model estimated from CGC items had better predictive performance than two standard models (unigram and mixture of unigrams) trained using the same data. To illustrate the practical utility of LDA models of biomedical corpora, a trained CGC LDA model was used for a retrospective study of nematode genes known to be associated with life span modification. Corpus-, document-, and word-level LDA parameters were combined with terms from the Gene Ontology to enhance the explanatory value of the CGC LDA model, and to suggest additional candidates for age-related genes. A novel, pairwise document similarity measure based on the posterior distribution on the topic simplex was formulated and used to search the CGC database for "homologs" of a "query" document discussing the life span-modifying clk-2 gene. Inspection of these document homologs enabled and facilitated the production of hypotheses about the function and role of clk-2. CONCLUSION: Like other graphical models for genetic, genomic and other types of biological data, LDA provides a method for extracting unanticipated insights and generating predictions amenable to subsequent experimental validation.


Assuntos
Proteínas de Caenorhabditis elegans/genética , Caenorhabditis elegans/genética , Bases de Dados Bibliográficas , Armazenamento e Recuperação da Informação , Longevidade/genética , Modelos Estatísticos , Proteínas de Ligação a Telômeros/genética , Animais , Teorema de Bayes , Processamento de Linguagem Natural , Reconhecimento Automatizado de Padrão , Terminologia como Assunto , Vocabulário Controlado
4.
BMC Bioinformatics ; 7: 147, 2006 Mar 16.
Artigo em Inglês | MEDLINE | ID: mdl-16542449

RESUMO

BACKGROUND: Ensemble attribute profile clustering is a novel, text-based strategy for analyzing a user-defined list of genes and/or proteins. The strategy exploits annotation data present in gene-centered corpora and utilizes ideas from statistical information retrieval to discover and characterize properties shared by subsets of the list. The practical utility of this method is demonstrated by employing it in a retrospective study of two non-overlapping sets of genes defined by a published investigation as markers for normal human breast luminal epithelial cells and myoepithelial cells. RESULTS: Each genetic locus was characterized using a finite set of biological properties and represented as a vector of features indicating attributes associated with the locus (a gene attribute profile). In this study, the vector space models for a pre-defined list of genes were constructed from the Gene Ontology (GO) terms and the Conserved Domain Database (CDD) protein domain terms assigned to the loci by the gene-centered corpus LocusLink. This data set of GO- and CDD-based gene attribute profiles, vectors of binary random variables, was used to estimate multiple finite mixture models and each ensuing model utilized to partition the profiles into clusters. The resultant partitionings were combined using a unanimous voting scheme to produce consensus clusters, sets of profiles that co-occurred consistently in the same cluster. Attributes that were important in defining the genes assigned to a consensus cluster were identified. The clusters and their attributes were inspected to ascertain the GO and CDD terms most associated with subsets of genes and in conjunction with external knowledge such as chromosomal location, used to gain functional insights into human breast biology. The 52 luminal epithelial cell markers and 89 myoepithelial cell markers are disjoint sets of genes. Ensemble attribute profile clustering-based analysis indicated that both lists contained groups of genes with the functional properties of membrane receptor biology/signal transduction and nucleic acid binding/transcription. A subset of the luminal markers was associated with metabolic and oxidoreductase activities, whereas a subset of myoepithelial markers was associated with protein hydrolase activity. CONCLUSION: Given a set of genes and/or proteins associated with a phenomenon, process or system of interest, ensemble attribute profile clustering provides a simple method for collating and sythesizing the annotation data pertaining to them that are present in text-based, gene-centered corpora. The results provide information about properties common and unique to subsets of the list and hence insights into the biology of the problem under investigation.


Assuntos
Algoritmos , Análise por Conglomerados , Perfilação da Expressão Gênica/métodos , Análise de Sequência com Séries de Oligonucleotídeos/métodos , Reconhecimento Automatizado de Padrão/métodos , Análise de Sequência de DNA/métodos , Alinhamento de Sequência/métodos
5.
Mech Ageing Dev ; 126(1): 193-208, 2005 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-15610779

RESUMO

The diverse nature of cancer- and aging-related genes presents a challenge for large-scale studies based on molecular sequence and profiling data. An underexplored source of data for modeling and analysis is the textual descriptions and annotations present in curated gene-centered biomedical corpora. Here, 450 genes designated by surveys of the scientific literature as being associated with cancer and aging were analyzed using two complementary approaches. The first, ensemble attribute profile clustering, is a recently formulated, text-based, semi-automated data interpretation strategy that exploits ideas from statistical information retrieval to discover and characterize groups of genes with common structural and functional properties. Groups of genes with shared and unique Gene Ontology terms and protein domains were defined and examined. Human homologs of a group of known Drosphila aging-related genes are candidates for genes that may influence lifespan (hep/MAPK2K7, bsk/MAPK8, puc/LOC285193). These JNK pathway-associated proteins may specify a molecular hub that coordinates and integrates multiple intra- and extracellular processes via space- and time-dependent interactions with proteins in other pathways. The second approach, a qualitative examination of the chromosomal locations of 311 human cancer- and aging-related genes, provides anecdotal evidence for a "phenotype position effect": genes that are proximal in the linear genome often encode proteins involved in the same phenomenon. Comparative genomics was employed to enhance understanding of several genes, including open reading frames, identified as new candidates for genes with roles in aging or cancer. Overall, the results highlight fundamental molecular and mechanistic connections between progenitor/stem cell lineage determination, embryonic morphogenesis, cancer, and aging. Despite diversity in the nature of the molecular and cellular processes associated with these phenomena, they seem related to the architectural hub of tissue polarity and a need to generate and control this property in a timely manner.


Assuntos
Envelhecimento/genética , Algoritmos , Bases de Dados Genéticas , Genes , Neoplasias/genética , Proteínas/genética , Biologia Computacional/métodos
6.
J Comput Biol ; 11(6): 1073-89, 2004.
Artigo em Inglês | MEDLINE | ID: mdl-15662199

RESUMO

Molecular profiling studies can generate abundance measurements for thousands of transcripts, proteins, metabolites, or other species in, for example, normal and tumor tissue samples. Treating such measurements as features and the samples as labeled data points, sparse hyperplanes provide a statistical methodology for classifying data points into one of two categories (classification and prediction) and defining a small subset of discriminatory features (relevant feature identification). However, this and other extant classification methods address only implicitly the issue of observed data being a combination of underlying signals and noise. Recently, robust optimization has emerged as a powerful framework for handling uncertain data explicitly. Here, ideas from this field are exploited to develop robust sparse hyperplanes, i.e., classification and relevant feature identification algorithms that are resilient to variation in the data. Specifically, each data point is associated with an explicit data uncertainty model in the form of an ellipsoid parameterized by a center and covariance matrix. The task of learning a robust sparse hyperplane from such data is formulated as a second order cone program (SOCP). Gaussian and distribution-free data uncertainty models are shown to yield SOCPs that are equivalent to the SCOP based on ellipsoidal uncertainty. The real-world utility of robust sparse hyperplanes is demonstrated via retrospective analysis of breast cancer related transcript profiles. Data-dependent heuristics are used to compute the parameters of each ellipsoidal data uncertainty model. The generalization performance of a specific implementation, designated "robust LIKNON," is better than its nominal counterpart. Finally, the strengths and limitations of robust sparse hyperplanes are discussed.


Assuntos
Biologia Computacional , Análise de Sequência de DNA/estatística & dados numéricos , Análise de Sequência de Proteína/estatística & dados numéricos , Neoplasias da Mama/genética , Neoplasias da Mama/metabolismo , Interpretação Estatística de Dados , Feminino , Genes BRCA1 , Genes BRCA2 , Humanos
7.
Mech Ageing Dev ; 124(1): 109-14, 2003 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-12618013

RESUMO

Transcript profiling can be used to elucidate the molecular and cellular mechanisms involved in ageing and cancer. A recent study of human gastrointestinal stromal tumours (GISTs) with mutations in the KIT gene, Cancer Res. 61 (2001) 8624 exemplifies a common type of investigation. cDNA microarrays were used to generate measurements for 1987 clones in two types of tissues: 13 KIT mutation-positive GISTs and 6 spindle cell tumours from locations outside the gastrointestinal tract. Statistical problems associated with such two-class, high-dimensional profiling data include simultaneous classification and relevant feature identification, probabilistic clustering and protein sequence family modelling. Here, the GIST data were reexamined using specific solutions to these problems, namely sparse hyperplanes, nai;ve Bayes models and profile hidden Markov models respectively. The integrated analysis of molecular profiling and sequence data highlighted 6 clones that may be of clinical and experimental interest. The protein encoded by one of these putative biomarkers defined a novel protein family present in diverse eucarya. The family may be involved in chromosome segregation and/or stability. One family member is a potential biomarker identified recently from a retrospective analysis of transcript profiles for sporadic breast cancer samples from patients with poor and good prognosis, Signal Process. (in press).


Assuntos
Perfilação da Expressão Gênica/estatística & dados numéricos , Análise de Sequência de Proteína/estatística & dados numéricos , Sequência de Aminoácidos , Animais , Teorema de Bayes , Carcinoma/genética , Análise por Conglomerados , Interpretação Estatística de Dados , Neoplasias Gastrointestinais/genética , Humanos , Cadeias de Markov , Modelos Estatísticos , Dados de Sequência Molecular , Mutação , Análise de Sequência com Séries de Oligonucleotídeos/estatística & dados numéricos , Proteínas Proto-Oncogênicas c-kit/genética , Homologia de Sequência de Aminoácidos , Transcrição Gênica
8.
Radiat Res ; 158(5): 568-80, 2002 Nov.
Artigo em Inglês | MEDLINE | ID: mdl-12385634

RESUMO

We have developed a theoretical model for evaluating radiation-induced chromosomal exchanges by explicitly taking into account interphase (G(0)/G(1)) chromosome structure, nuclear organization of chromosomes, the production of double-strand breaks (DSBs), and the subsequent rejoinings in a faithful or unfaithful manner. Each of the 46 chromosomes for human lymphocytes (40 chromosomes for mouse lymphocytes) is modeled as a random polymer inside a spherical volume. The chromosome spheres are packed randomly inside a spherical nucleus with an allowed overlap controlled by a parameter Omega. The rejoining of DSBs is determined by a Monte Carlo procedure using a Gaussian proximity function with an interaction range parameter sigma. Values of Omega and sigma have been found which yield calculated results of interchromosomal aberration frequencies that agree with a wide range of experimental data. Our preferred solution is one with an interaction range of 0.5 microm coupled with a relatively small overlap parameter of 0.675 microm, which more or less confirms previous estimates. We have used our model with these parameter values and with resolution or detectability limits to calculate yields of translocations and dicentrics for human lymphocytes exposed to low-LET radiation that agree with experiments in the dose range 0.09 to 4 Gy. Five different experimental data sets have been compared with the theoretical results. Essentially all of the experimental data fall between theoretical curves corresponding to resolution limits of 1 Mbp and 20 Mbp, which may reflect the fact that different investigators use different limits for sensitivity or detectability. Translocation yields for mouse lymphocytes have also been calculated and are in good agreement with experimental data from 1 cGy to 10 cGy. There is also good agreement with recent data on complex aberrations. Our model is expected to be applicable to both low- and high-LET radiation, and we include a sample prediction of the yield of interchromosomal rejoining in the dose range 0.22 Gy to 2 Gy of 1000 MeV/nucleon iron particles. This dose range corresponds to average particle traversals per nucleus ranging from 1.0 to 9.12.


Assuntos
Aberrações Cromossômicas/efeitos da radiação , Cromossomos de Mamíferos/efeitos da radiação , Interfase/efeitos da radiação , Modelos Teóricos , Animais , Núcleo Celular/genética , Núcleo Celular/efeitos da radiação , Quebra Cromossômica , Dano ao DNA/efeitos da radiação , Humanos , Linfócitos/metabolismo , Linfócitos/efeitos da radiação , Matemática , Camundongos , Doses de Radiação , Recombinação Genética/efeitos da radiação
9.
Curr Biol ; 11(21): 1706-10, 2001 Oct 30.
Artigo em Inglês | MEDLINE | ID: mdl-11696330

RESUMO

An important quest in modern biology is to identify genes involved in aging. Model organisms such as the nematode Caenorhabditis elegans are particularly useful in this regard. The C. elegans genome has been sequenced [1], and single gene mutations that extend adult life span have been identified [2]. Among these longevity-controlling loci are four apparently unrelated genes that belong to the clk family. In mammals, telomere length and structure can influence cellular, and possibly organismal, aging. Here, we show that clk-2 encodes a regulator of telomere length in C. elegans.


Assuntos
Envelhecimento/genética , Proteínas de Caenorhabditis elegans/genética , Genes de Helmintos , Proteínas de Saccharomyces cerevisiae , Proteínas de Ligação a Telômeros , Telômero/genética , Sequência de Aminoácidos , Animais , Proteínas de Ligação a DNA/genética , Dados de Sequência Molecular , Mutação , RNA Antissenso , RNA Interferente Pequeno , Tolerância a Radiação , Homologia de Sequência de Aminoácidos , Raios X
10.
Mol Cell Biol ; 21(16): 5591-604, 2001 Aug.
Artigo em Inglês | MEDLINE | ID: mdl-11463840

RESUMO

SATB1 is expressed primarily in thymocytes and orchestrates temporal and spatial expression of a large number of genes in the T-cell lineage. SATB1 binds to the bases of chromatin loop domains in vivo, recognizing a special DNA context with strong base-unpairing propensity. The majority of thymocytes are eliminated by apoptosis due to selection processes in the thymus. We investigated the fate of SATB1 during thymocyte and T-cell apoptosis. Here we show that SATB1 is specifically cleaved by a caspase 6-like protease at amino acid position 254 to produce a 65-kDa major fragment containing both a base-unpairing region (BUR)-binding domain and a homeodomain. We found that this cleavage separates the DNA-binding domains from amino acids 90 to 204, a region which we show to be a dimerization domain. The resulting SATB1 monomer loses its BUR-binding activity, despite containing both its DNA-binding domains, and rapidly dissociates from chromatin in vivo. We found this dimerization region to have sequence similarity to PDZ domains, which have been previously shown to be involved in signaling by conferring protein-protein interactions. SATB1 cleavage during Jurkat T-cell apoptosis induced by an anti-Fas antibody occurs concomitantly with the high-molecular-weight fragmentation of chromatin of ~50-kb fragments. Our results suggest that mechanisms of nuclear degradation early in apoptotic T cells involve efficient removal of SATB1 by disrupting its dimerization and cleavage of genomic DNA into loop domains to ensure rapid and efficient disassembly of higher-order chromatin structure.


Assuntos
Apoptose/fisiologia , Caspases/fisiologia , Cromatina/fisiologia , Proteínas de Ligação a DNA/fisiologia , Proteínas de Ligação à Região de Interação com a Matriz , Linfócitos T/patologia , Linfócitos T/fisiologia , Sequência de Aminoácidos , Caspase 6 , Proteínas de Ligação a DNA/química , Dimerização , Humanos , Células Jurkat , Dados de Sequência Molecular , Especificidade por Substrato
11.
Mol Cell ; 7(6): 1201-11, 2001 Jun.
Artigo em Inglês | MEDLINE | ID: mdl-11430823

RESUMO

The key protein subunit of the telomerase complex, known as TERT, possesses a reverse transcriptase (RT)-like domain that is conserved in enzymes encoded by retroviruses and retroelements. Structural and functional analysis of HIV-1 RT suggests that RT processivity is governed, in part, by the conserved motif C, motif E, and a C-terminal domain. Mutations in analogous regions of the yeast TERT were found to have anticipated effects on telomerase processivity in vitro, suggesting a great deal of mechanistic and structural similarity between TERT and retroviral RTs, and a similarity that goes beyond the homologous domain. A close correlation was uncovered between telomerase processivity and telomere length in vivo, suggesting that enzyme processivity is a limiting factor for telomere maintenance.


Assuntos
Transcriptase Reversa do HIV/metabolismo , Telomerase/metabolismo , Telômero/enzimologia , Domínio Catalítico , Proteínas de Ligação a DNA , Técnicas In Vitro , Dados de Sequência Molecular , Mutagênese , RNA , Homologia de Sequência de Aminoácidos , Leveduras
12.
J Biol Chem ; 276(29): 27591-6, 2001 Jul 20.
Artigo em Inglês | MEDLINE | ID: mdl-11353770

RESUMO

The Drosophila S3 ribosomal protein has important roles in both protein translation and DNA repair. In regards to the latter activity, it has been shown that S3 contains vigorous N-glycosylase activity for the removal of 8-oxoguanine residues in DNA that leaves baseless sites in their places. Drosophila S3 also possesses an apurinic/apyrimidinic (AP) lyase activity in which the enzyme catalyzes a beta-elimination reaction that cleaves phosphodiester bonds 3' and adjacent to an AP lesion in DNA. In certain situations, this is followed by a delta-elimination reaction that ultimately leads to the formation of a single nucleotide gap in DNA bordered by 5'- and 3'-phosphate groups. The human S3 protein, although 80% identical to its Drosophila homolog and shorter by only two amino acids, has only marginal N-glycosylase activity. Its lyase activity only cleaves AP DNA by a beta-elimination reaction, thus further distinguishing itself from the Drosophila S3 protein in lacking a delta-elimination activity. Using a hidden Markov model analysis based on the crystal structures of several DNA repair proteins, the enzymatic differences between Drosophila and human S3 were suggested by the absence of a conserved glutamine residue in human S3 that usually resides at the cleft of the deduced active site pocket of DNA glycosylases. Here we show that the replacement of the Drosophila glutamine by an alanine residue leads to the complete loss of glycosylase activity. Unexpectedly, the delta-elimination reaction at AP sites was also abrogated by a change in the Drosophila glutamine residue. Thus, a single amino acid change converted the Drosophila activity into one that is similar to that possessed by the human S3 protein. In support of this were experiments executed in vivo that showed that human S3 and the Drosophila site-directed glutamine-changed S3 performed poorly when compared with Drosophila wild-type S3 and its ability to protect a bacterial mutant from the harmful effects of DNA-damaging agents.


Assuntos
Reparo do DNA , Guanina/análogos & derivados , Guanina/metabolismo , Proteínas Ribossômicas/metabolismo , Sequência de Aminoácidos , Substituição de Aminoácidos , Animais , Sequência de Bases , Boroidretos/química , Catálise , DNA/metabolismo , Dano ao DNA , Primers do DNA , Drosophila , Humanos , Mutagênese Sítio-Dirigida , Mutagênicos/toxicidade , Proteínas Ribossômicas/química , Proteínas Ribossômicas/genética , Homologia de Sequência de Aminoácidos
13.
Nucleic Acids Res ; 29(8): 1772-80, 2001 Apr 15.
Artigo em Inglês | MEDLINE | ID: mdl-11292850

RESUMO

Yeast co-expressing rat APOBEC-1 and a fragment of human apolipoprotein B (apoB) mRNA assembled functional editosomes and deaminated C6666 to U in a mooring sequence-dependent fashion. The occurrence of APOBEC-1-complementing proteins suggested a naturally occurring mRNA editing mechanism in yeast. Previously, a hidden Markov model identified seven yeast genes encoding proteins possessing putative zinc-dependent deaminase motifs. Here, only CDD1, a cytidine deaminase, is shown to have the capacity to carry out C-->U editing on a reporter mRNA. This is only the second report of a cytidine deaminase that can use mRNA as a substrate. CDD1-dependent editing was growth phase regulated and demonstrated mooring sequence-dependent editing activity. Candidate yeast mRNA substrates were identified based on their homology with the mooring sequence-containing tripartite motif at the editing site of apoB mRNA and their ability to be edited by ectopically expressed APOBEC-1. Naturally occurring yeast mRNAs edited to a significant extent by CDD1 were, however, not detected. We propose that CDD1 be designated an orphan C-->U editase until its native RNA substrate, if any, can be identified and that it be added to the CDAR (cytidine deaminase acting on RNA) family of editing enzymes.


Assuntos
Citidina Desaminase/metabolismo , Edição de RNA , Leveduras/enzimologia , Desaminase APOBEC-1 , Sequência de Aminoácidos , Animais , Sequência de Bases , Western Blotting , Citidina Desaminase/análise , Citidina Desaminase/química , Citidina Desaminase/genética , Imunofluorescência , Teste de Complementação Genética , Humanos , Cinética , Cadeias de Markov , Dados de Sequência Molecular , Fases de Leitura Aberta/genética , Estrutura Terciária de Proteína , Edição de RNA/genética , RNA Fúngico/genética , RNA Fúngico/metabolismo , RNA Mensageiro/genética , RNA Mensageiro/metabolismo , Ratos , Proteínas Recombinantes de Fusão/análise , Proteínas Recombinantes de Fusão/química , Proteínas Recombinantes de Fusão/metabolismo , Alinhamento de Sequência , Leveduras/genética
14.
Physiol Genomics ; 5(2): 99-111, 2001 Mar 08.
Artigo em Inglês | MEDLINE | ID: mdl-11242594

RESUMO

Transcription profiling experiments permit the expression levels of many genes to be measured simultaneously. Given profiling data from two types of samples, genes that most distinguish the samples (marker genes) are good candidates for subsequent in-depth experimental studies and developing decision support systems for diagnosis, prognosis, and monitoring. This work proposes a mixture of feature relevance experts as a method for identifying marker genes and illustrates the idea using published data from samples labeled as acute lymphoblastic and myeloid leukemia (ALL, AML). A feature relevance expert implements an algorithm that calculates how well a gene distinguishes samples, reorders genes according to this relevance measure, and uses a supervised learning method [here, support vector machines (SVMs)] to determine the generalization performances of different nested gene subsets. The mixture of three feature relevance experts examined implement two existing and one novel feature relevance measures. For each expert, a gene subset consisting of the top 50 genes distinguished ALL from AML samples as completely as all 7,070 genes. The 125 genes at the union of the top 50s are plausible markers for a prototype decision support system. Chromosomal aberration and other data support the prediction that the three genes at the intersection of the top 50s, cystatin C, azurocidin, and adipsin, are good targets for investigating the basic biology of ALL/AML. The same data were employed to identify markers that distinguish samples based on their labels of T cell/B cell, peripheral blood/bone marrow, and male/female. Selenoprotein W may discriminate T cells from B cells. Results from analysis of transcription profiling data from tumor/nontumor colon adenocarcinoma samples support the general utility of the aforementioned approach. Theoretical issues such as choosing SVM kernels and their parameters, training and evaluating feature relevance experts, and the impact of potentially mislabeled samples on marker identification (feature selection) are discussed.


Assuntos
Biomarcadores Tumorais/genética , Perfilação da Expressão Gênica , Leucemia Mieloide/genética , Leucemia-Linfoma Linfoblástico de Células Precursoras/genética , Transcrição Gênica/genética , Doença Aguda , Adenocarcinoma/diagnóstico , Adenocarcinoma/genética , Algoritmos , Linfócitos B/metabolismo , Teorema de Bayes , Células da Medula Óssea/metabolismo , Criança , Aberrações Cromossômicas/genética , Biologia Computacional/métodos , Interpretação Estatística de Dados , Feminino , Regulação Neoplásica da Expressão Gênica , Marcadores Genéticos/genética , Humanos , Leucemia Mieloide/diagnóstico , Masculino , Especificidade de Órgãos , Leucemia-Linfoma Linfoblástico de Células Precursoras/diagnóstico , RNA Neoplásico/análise , RNA Neoplásico/genética , Reprodutibilidade dos Testes , Sensibilidade e Especificidade , Caracteres Sexuais , Linfócitos T/metabolismo
16.
Physiol Genomics ; 4(2): 109-126, 2000 Dec 18.
Artigo em Inglês | MEDLINE | ID: mdl-11120872

RESUMO

A modular framework is proposed for modeling and understanding the relationships between molecular profile data and other domain knowledge using a combination of generative (here, graphical models) and discriminative [Support Vector Machines (SVMs)] methods. As illustration, naive Bayes models, simple graphical models, and SVMs were applied to published transcription profile data for 1,988 genes in 62 colon adenocarcinoma tissue specimens labeled as tumor or nontumor. These unsupervised and supervised learning methods identified three classes or subtypes of specimens, assigned tumor or nontumor labels to new specimens and detected six potentially mislabeled specimens. The probability parameters of the three classes were utilized to develop a novel gene relevance, ranking, and selection method. SVMs trained to discriminate nontumor from tumor specimens using only the 50-200 top-ranked genes had the same or better generalization performance than the full repertoire of 1,988 genes. Approximately 90 marker genes were pinpointed for use in understanding the basic biology of colon adenocarcinoma, defining targets for therapeutic intervention and developing diagnostic tools. These potential markers highlight the importance of tissue biology in the etiology of cancer. Comparative analysis of molecular profile data is proposed as a mechanism for predicting the physiological function of genes in instances when comparative sequence analysis proves uninformative, such as with human and yeast translationally controlled tumour protein. Graphical models and SVMs hold promise as the foundations for developing decision support systems for diagnosis, prognosis, and monitoring as well as inferring biological networks.


Assuntos
Perfilação da Expressão Gênica , Genes/genética , Teorema de Bayes , Humanos , Modelos Genéticos , Neoplasias/genética
17.
Physiol Genomics ; 4(2): 127-135, 2000 Dec 18.
Artigo em Inglês | MEDLINE | ID: mdl-11120873

RESUMO

A novel suite of analytical techniques and visualization tools are applied to 78 published transcription profiling experiments monitoring 5,687 Saccharomyces cerevisiae genes in studies examining cell cycle, responses to stress, and diauxic shift. A naive Bayes model discovered and characterized 45 classes of gene profile vectors. An enrichment measure quantified the association between these classes and specific external knowledge defined by four sets of categories to which genes can be assigned: 106 protein functions, 5 stages of the cell cycle, 265 transcription factors, and 16 chromosomal locations. Many of the 38 genes in class 42 are known to play roles in copper and iron homeostasis. The 17 uncharacterized open reading frames in this class may be involved in similar homeostatic processes; human homologs of two of them could be associated with as yet undefined disease states arising from aberrant metal ion regulation. The Met4, Met31, and Met32 transcription factors may play a role in coregulating genes involved in copper and iron metabolism. Extensions of the simple graphical model used for clustering to learning more complex models of genetic networks are discussed.


Assuntos
Cobre/metabolismo , Ferro/metabolismo , Saccharomyces cerevisiae/genética , Teorema de Bayes , Perfilação da Expressão Gênica , Regulação Fúngica da Expressão Gênica , Genes Fúngicos/genética , Homeostase , Modelos Genéticos , Análise de Sequência com Séries de Oligonucleotídeos , Saccharomyces cerevisiae/metabolismo
18.
Mol Cell Biol ; 20(14): 5196-207, 2000 Jul.
Artigo em Inglês | MEDLINE | ID: mdl-10866675

RESUMO

Telomerase is a ribonucleoprotein reverse transcriptase responsible for the maintenance of one strand of telomere terminal repeats. The key protein subunit of the telomerase complex, known as TERT, possesses reverse transcriptase-like motifs that presumably mediate catalysis. These motifs are located in the C-terminal region of the polypeptide. Hidden Markov model-based sequence analysis revealed in the N-terminal region of all TERTs the presence of four conserved motifs, named GQ, CP, QFP, and T. Point mutation analysis of conserved residues confirmed the functional importance of the GQ motif. In addition, the distinct phenotypes of the GQ mutants suggest that this motif may play at least two distinct functions in telomere maintenance. Deletion analysis indicates that even the most N-terminal nonconserved region of yeast TERT (N region) is required for telomerase function. This N region exhibits a nonspecific nucleic acid binding activity that probably reflects an important physiologic function. Expression studies of various portions of the yeast TERT in Escherichia coli suggest that the N region and the GQ motif together may constitute a stable domain. We propose that all TERTs may have a bipartite organization, with an N-GQ domain connected to the other motifs through a flexible linker.


Assuntos
RNA , Telomerase/genética , Telomerase/metabolismo , Sequência de Aminoácidos , Sequência de Bases , Sítios de Ligação , Sequência Conservada , Proteínas de Ligação a DNA , Endopeptidases/metabolismo , Estabilidade Enzimática , Dados de Sequência Molecular , Mutação , Ácidos Nucleicos/metabolismo , Homologia de Sequência de Aminoácidos
19.
Mol Vis ; 6: 30-9, 2000 Apr 07.
Artigo em Inglês | MEDLINE | ID: mdl-10756179

RESUMO

PURPOSE: We compared the structure and function of interphotoreceptor retinoid-binding protein (IRBP) related proteins and predicted domain and secondary structure within each repeat of IRBP and its relatives. We tested whether tail specific protease (Tsp), which bears sequence similarity to IRBP Domain B, binds fatty acids or retinoids, and whether IRBP possessed protease activity resembling Tsp's catalytic function. These tests helped us to learn whether the primary sequence similarities of family members extended to higher order structural and functional levels. METHODS: Predictions derived from multiple sequence alignments among IRBP and Tsp family members and secondary structure computer programs were carried out. The first repeat of human IRBP (EcR1) and Tsp were expressed, purified, and tested for binding properties. Tsp was examined for fluorescence enhancement of retinol or 16-anthroyloxy-palmitic acid (16-AP) to test for ligand binding. IRBP was tested for protease activity. RESULTS: Tsp did not exhibit fluorescence enhancement with retinol or 16-AP. IRBP did not exhibit protease activity. The positions of critical residues needed for the ligand binding properties of retinol were predicted. Primary sequence and three-dimensional similarity was found between Domain A of IRBP Repeat 3 and eglin c. CONCLUSIONS: The sequence similarity of Tsp and IRBP raised the possibility that each might share the function of the other protein: IRBP might possess protease activity or Tsp might possess retinoid or fatty acid binding activity. Our studies do not support such a shared function hypothesis, and suggest that the sequence similarity is the result of maintenance of structure. The finding of similarity to eglin c in Domain A suggests the possibility of a tight interaction between Domain A and Domain B, possibly implying the need for Domain A in retinoid-binding, and suggesting that both Domains should be present in testing mutations. The positions of predicted critical amino acids suggest models in which a large binding pocket holds the retinoid or fatty acid ligand. These predictions are tested in a companion paper.


Assuntos
Endopeptidases/química , Proteínas do Olho , Proteínas de Ligação ao Retinol/química , Análise por Conglomerados , Humanos , Ligantes , Cadeias de Markov , Ácidos Palmíticos/química , Ligação Proteica , Estrutura Secundária de Proteína , Estrutura Terciária de Proteína , Proteínas , Alinhamento de Sequência , Serpinas/química , Espectrometria de Fluorescência , Vitamina A/química
20.
Mol Vis ; 6: 40-50, 2000 Apr 07.
Artigo em Inglês | MEDLINE | ID: mdl-10756180

RESUMO

PURPOSE: The purpose of this study was to measure the effects of mutations on the retinol binding capability of human Repeat 1 of interphotoreceptor retinoid-binding protein (IRBP). First, we predicted important functional amino acids by several computer programs. We also noted the lack of shared functions between Tail-specific protease (Tsp) and IRBP, which bear sequence similarity, and this aided in predicting functional residues. We analyzed the effects of point substitutions on the retinol and fatty acid binding properties of Repeat 1 of human IRBP at 25 and 50 degrees C. METHODS: To find residues critical to retinol binding that might affect function, a series of thirteen mutations were created by site-specific mutagenesis between positions 140 and 280 in Repeat 1 of human IRBP. These mutants were expressed, purified, and tested for binding properties. The conformations of the proteins were examined by circular dichroism (CD) scans. RESULTS: Seven of the mutations exhibited reduced binding capacity, and five were not expressed at high enough levels to assess binding activity. Four of the mutants were purified, and their CD scans were very similar to those of Repeat 1. Only one of the mutations did not affect binding, folding, or expression when compare to wild type Repeat 1. CONCLUSIONS: Several IRBP mutants containing point mutations retained native structure but lost retinol binding function. The data suggest that retinol binding is affected by many different amino acid substitutions in or near a binding pocket. That even a single point substitution can profoundly affect binding without affecting overall conformation suggests that much of Domain B (from amino acid positions 80 to 300) is involved with ligand binding. This excludes three previously proposed IRBP-retinol binding mechanisms: (1) retinol binds to a small portion of the protein repeat, (2) retinol can bind to any hydrophobic patch in the protein, and (3) native conformation is not required for retinol binding to the repeat.


Assuntos
Proteínas do Olho , Proteínas de Ligação ao Retinol/química , Substituição de Aminoácidos , Sítios de Ligação , Western Blotting , Soluções Tampão , Dicroísmo Circular , Endopeptidases/química , Escherichia coli/metabolismo , Humanos , Mutagênese Sítio-Dirigida , Mutação Puntual , Desnaturação Proteica , Dobramento de Proteína , Proteínas de Ligação ao Retinol/genética , Proteínas de Ligação ao Retinol/isolamento & purificação , Proteínas de Ligação ao Retinol/metabolismo , Espectrometria de Fluorescência
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...